Current Issue : January - March Volume : 2020 Issue Number : 1 Articles : 5 Articles
Background: Protein pulldown using Methyl-CpG binding domain (MBD) proteins followed by high-throughput\nsequencing is a common method to determine DNA methylation. Algorithms have been developed to estimate\nabsolute methylation level from read coverage generated by affinity enrichment-based techniques, but the most\naccurate one for MBD-seq data requires additional data from an SssI-treated Control experiment.\nResults: Using our previous characterizations of Methyl-CpG/MBD2 binding in the context of an MBD pulldown\nexperiment, we build a model of expected MBD pulldown reads as drawn from SssI-treated DNA. We use the program\nBayMeth to evaluate the effectiveness of this model by substituting calculated SssI Control data for the observed SssI\nControl data. By comparing methylation predictions against those from an RRBS data set, we find that BayMeth run\nwith our modeled SssI Control data performs better than BayMeth run with observed SssI Control data, on both 100\nbp and 10 bp windows. Adapting the model to an external data set solely by changing the average fragment length,\nour calculated data still informs the BayMeth program to a similar level as observed data in predicting methylation\nstate on a pulldown data set with matching WGBS estimates.\nConclusion: In both internal and external MBD pulldown data sets tested in this study, BayMeth used with our\nmodeled pulldown coverage performs better than BayMeth run without the inclusion of any estimate of SssI Control\npulldown, and is comparable to - and in some cases better than -using observed SssI Control data with the BayMeth\nprogram. Thus, our MBD pulldown alignment model can improve methylation predictions without the need to\nperform additional control experiments....
Background: Literature Based Discovery (LBD) produces more potential hypotheses than can be manually reviewed,\nmaking automatically ranking these hypotheses critical. In this paper, we introduce the indirect association measures\nof Linking Term Association (LTA), Minimum Weight Association (MWA), and Shared B to C Set Association (SBC), and\ncompare them to Linking Set Association (LSA), concept embeddings vector cosine, Linking Term Count (LTC), and\ndirect co-occurrence vector cosine. Our proposed indirect association measures extend traditional association\nmeasures to quantify indirect rather than direct associations while preserving valuable statistical properties.\nResults: We perform a comparison between several different hypothesis ranking methods for LBD, and compare\nthem against our proposed indirect association measures. We intrinsically evaluate each methodâ??s performance using\nits ability to estimate semantic relatedness on standard evaluation datasets. We extrinsically evaluate each methodâ??s\nability to rank hypotheses in LBD using a time-slicing dataset based on co-occurrence information, and another\ntime-slicing dataset based on SemRep extracted-relationships. Precision and recall curves are generated by ranking\nterm pairs and applying a threshold at each rank.\nConclusions: Results differ depending on the evaluation methods and datasets, but it is unclear if this is a result of\nbiases in the evaluation datasets or if one method is truly better than another. We conclude that LTC and SBC are the\nbest suited methods for hypothesis ranking in LBD, but there is value in having a variety of methods to choose from....
Background: Biomedical named entity recognition (BioNER) is a fundamental and essential task for biomedical\nliterature mining, which affects the performance of downstream tasks. Most BioNER models rely on domain-specific\nfeatures or hand-crafted rules, but extracting features from massive data requires much time and human efforts. To\nsolve this, neural network models are used to automatically learn features. Recently, multi-task learning has been\napplied successfully to neural network models of biomedical literature mining. For BioNER models, using multi-task\nlearning makes use of features from multiple datasets and improves the performance of models.\nResults: In experiments, we compared our proposed model with other multi-task models and found our model\noutperformed the others on datasets of gene, protein, disease categories. We also tested the performance of different\ndataset pairs to find out the best partners of datasets. Besides, we explored and analyzed the influence of different\nentity types by using sub-datasets. When dataset size was reduced, our model still produced positive results.\nConclusion: We propose a novel multi-task model for BioNER with the cross-sharing structure to improve the\nperformance of multi-task models. The cross-sharing structure in our model makes use of features from both datasets\nin the training procedure. Detailed analysis about best partners of datasets and influence between entity categories\ncan provide guidance of choosing proper dataset pairs for multi-task training....
Background: With the advent of array-based techniques to measure methylation levels in primary tumor samples,\nsystematic investigations of methylomes have widely been performed on a large number of tumor entities. Most of\nthese approaches are not based on measuring individual cell methylation but rather the bulk tumor sample DNA,\nwhich contains a mixture of tumor cells, infiltrating immune cells and other stromal components. This raises questions\nabout the purity of a certain tumor sample, given the varying degrees of stromal infiltration in different entities. Previous\nmethods to infer tumor purity require or are based on the use of matching control samples which are rarely available.\nHere we present a novel, reference free method to quantify tumor purity, based on two Random Forest classifiers, which\nwere trained on ABSOLUTE as well as ESTIMATE purity values from TCGA tumor samples. We subsequently apply this\nmethod to a previously published, large dataset of brain tumors, proving that these models perform well in datasets that\nhave not been characterized with respect to tumor purity .\nResults: Using two gold standard methods to infer purity â?? the ABSOLUTE score based on whole genome sequencing\ndata and the ESTIMATE score based on gene expression data- we have optimized Random Forest classifiers to predict\ntumor purity in entities that were contained in the TCGA project. We validated these classifiers using an independent\ntest data set and cross-compared it to other methods which have been applied to the TCGA datasets (such as ESTIMATE\nand LUMP).\nUsing Illumina methylation array data of brain tumor entities (as published in Capper et al. (Nature 555:469-474,2018)) we\napplied this model to estimate tumor purity and find that subgroups of brain tumors display substantial differences in\ntumor purity.\nConclusions: Random forest- based tumor purity prediction is a well suited tool to extrapolate gold standard measures\nof purity to novel methylation array datasets. In contrast to other available methylation based tumor purity estimation\nmethods, our classifiers do not need a priori knowledge about the tumor entity or matching control tissue to predict\ntumor purity....
Background: Diagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the\nmutational status of a patientâ??s genome. This analysis relies on previously published information regarding the\nassociation of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical\nsearch engines to obtain such information; however, the vast majority of scientific publications focus on basic science\nand have no direct clinical impact. We develop the Variant-Information Search Tool (VIST), a search engine designed\nfor the targeted search of clinically relevant publications given an oncological mutation profile.\nResults: VIST indexes all PubMed abstracts and content from ClinicalTrials.gov. It applies advanced text mining to\nidentify mentions of genes, variants and drugs and uses machine learning based scoring to judge the clinical\nrelevance of indexed abstracts. Its functionality is available through a fast and intuitive web interface. We perform\nseveral evaluations, showing that VISTâ??s ranking is superior to that of PubMed or a pure vector space model with\nregard to the clinical relevance of a documentâ??s content.\nConclusion: Different user groups search repositories of scientific publications with different intentions. This diversity\nis not adequately reflected in the standard search engines, often leading to poor performance in specialized settings.\nWe develop a search engine for the specific case of finding documents that are clinically relevant in the course of\ncancer treatment. We believe that the architecture of our engine, heavily relying on machine learning algorithms, can\nalso act as a blueprint for search engines in other, equally specific domains....
Loading....